AITopics | speech-to-text system

Collaborating Authors

speech-to-text system

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Evaluating Speech-to-Text Systems with PennSound

Wright, Jonathan, Liberman, Mark, Ryant, Neville, Fiumara, James

arXiv.org Artificial IntelligenceApr-9-2025

A random sample of nearly 10 hours of speech from PennSound, the world's largest online collection of poetry readings and discussions, was used as a benchmark to evaluate several commercial and open-source speech-to-text systems. PennSound's wide variation in recording conditions and speech styles makes it a good representative for many other untranscribed audio collections. Reference transcripts were created by trained annotators, and system transcripts were produced from AWS, Azure, Google, IBM, NeMo, Rev.ai, Whisper, and Whisper.cpp. Based on word error rate, Rev.ai was the top performer, and Whisper was the top open source performer (as long as hallucinations were avoided). AWS had the best diarization error rates among three systems. However, WER and DER differences were slim, and various tradeoffs may motivate choosing different systems for different end users. We also examine the issue of hallucinations in Whisper. Users of Whisper should be cautioned to be aware of runtime options, and whether the speed vs accuracy trade off is acceptable.

artificial intelligence, machine learning, speech recognition, (16 more...)

arXiv.org Artificial Intelligence

2504.05702

Country: North America > United States (0.29)

Genre: Research Report (0.40)

Industry: Information Technology (0.95)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.90)

Add feedback

The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems

Liesenfeld, Andreas, Lopez, Alianda, Dingemanse, Mark

arXiv.org Artificial IntelligenceJul-28-2023

Speech recognition systems are a key intermediary in voice-driven human-computer interaction. Although speech recognition works well for pristine monologic audio, real-life use cases in open-ended interactive settings still present many challenges. We argue that timing is mission-critical for dialogue systems, and evaluate 5 major commercial ASR systems for their conversational and multilingual support. We find that word error rates for natural conversational data in 6 languages remain abysmal, and that overlap remains a key challenge (study 1). This impacts especially the recognition of conversational words (study 2), and in turn has dire consequences for downstream intent recognition (study 3). Our findings help to evaluate the current state of conversational ASR, contribute towards multidimensional error analysis and evaluation, and identify phenomena that need most attention on the way to build robust interactive speech technologies.

computational linguistic, error rate, overlap, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2023.sigdial-1.45

2307.15493

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Netherlands > Gelderland > Nijmegen (0.04)
Europe > Ireland (0.04)
(7 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

How to Build a Speech-to-Text System using ChatGPT and Python - Pyresearch - Medium

#artificialintelligenceMar-9-2023, 19:05:25 GMT

Check out our latest tutorial on how to build a speech-to-text system using ChatGPT and Python! Learn how to leverage the power of natural language processing and deep learning to convert audio to text with amazing accuracy. Please let me know your valuable feedback on the video by means of comments. Please like and share the video. Do not forget to subscribe to my channel for more educational videos.

chatgpt and python, pyresearch, speech-to-text system, (5 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)

Add feedback

Voice-based applications for E-Health

#artificialintelligenceSep-30-2020, 15:05:04 GMT

Healthcare has been one of the countless beneficiaries of the revolutionary advances that widespread computing has brought. Fast, efficient data organization, storage, and access that have greatly sped up the medical enterprise, yet many low hanging fruits remain hanging. Chief among those is the increased application of technologies that can process speech. In this post, we'll share with you how speech technology can improve healthcare in the three following ways. Finally, (3) voice signal analysis can be used for earlier diagnosis and to help track the changes in medical conditions over time.

artificial intelligence, speech recognition, speech technology, (18 more...)

#artificialintelligence

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.70)
Health & Medicine > Health Care Technology (0.65)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.52)

Add feedback

Voice-based applications for E-Health – H2020 COMPRISE

#artificialintelligenceSep-24-2020, 21:55:53 GMT

Healthcare has been one of the countless beneficiaries of the revolutionary advances that widespread computing has brought. Fast, efficient data organisation, storage and access that have greatly sped up the medical enterprise, yet many low hanging fruits remain hanging. Chief among those is the increased application of technologies that can process speech. In this post, we'll share with you how speech technology can improve healthcare in the three following ways. Finally, (3) voice signal analysis can be used for earlier diagnosis and to help track the changes of medical condition over time.

artificial intelligence, speech recognition, speech technology, (18 more...)

#artificialintelligence

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.70)
Health & Medicine > Health Care Technology (0.66)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.52)

Add feedback

Usage of speaker embeddings for more inclusive speech-to-text

AIHubJun-26-2020, 11:17:52 GMT

English is one of the most widely used languages worldwide, with approximately 1.2 billion speakers. In order to maximise the performance of speech-to-text systems it is vital to build them in a way that recognises different accents. Recently, spoken dialogue systems have been incorporated into various devices such as smartphones, call services, and navigation systems. These intelligent agents can assist users in performing daily tasks such as booking tickets, setting-up calendar items, or finding restaurants via spoken interaction. They have the potential to be more widely used in a vast range of applications in the future, especially in the education, government, healthcare, and entertainment sectors.

artificial intelligence, machine learning, natural language, (18 more...)

AIHub

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.76)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)

Add feedback

AI learns how to fool text-to-speech. That's bad news for voice assistants

#artificialintelligenceJan-11-2018, 20:39:56 GMT

A pair of computer scientists at the University of California, Berkeley developed an AI-based attack that targets speech-to-text systems. With their method, no matter what an audio file sounds like, the text output will be whatever the attacker wants it to be. This one is pretty cool, but it's also another entry for the "terrifying uses of AI" category. The team, Nicholas Carlini and Professor David Wagner, were able to trick Mozilla's popular DeepSpeech open-source speech-to-text system by, essentially, turning it on itself. Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (at a rate of up to 50 characters per second) … Our attack works with 100% success, regardless of the desired transcription, or initial source phrase being spoken.

artificial intelligence, speech recognition, voice assistant, (10 more...)

#artificialintelligence

Country: North America > United States > California > Alameda County > Berkeley (0.26)

Genre:

Research Report (0.52)
Instructional Material > Course Syllabus & Notes (0.40)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback